Optimized Fuzzy Text Alignment for Plagiarism Detection

نویسندگان

  • Fernando Sánchez-Vega
  • Manuel Montes-y-Gómez
  • Luis Villaseñor Pineda
چکیده

This paper describes a method for plagiarism detection based on a fuzzy alignment between a given pair of documents. The proposed method assigns a weight to each word of the suspicious document according to the straightness of its alignment to the source document; this weight is used as a kind of plagiarism probability measure for each word of the suspicious document. The paper also presents a strategy to optimize the alignment of the two documents based on the evaluation of all possible matches in a limited context. Evaluation results on the test set of the PAN corpus show that the method is relatively fast and that it could detect 35% of the plagiarized words with accuracy greater than 50%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Text Alignment Corpus for Persian Plagiarism Detection

This paper describes how a Persian text alignment corpus was constructed to evaluate plagiarism detection systems. This corpus is in PAN format and contains 11,089 documents and more than 11,603 plagiarism cases. Efforts were made to simulate various types of plagiarism manually, semi-automatically, or automatically in this large-scale corpus.

متن کامل

Overview of the 6th International Competition on Plagiarism Detection

This paper overviews 17 plagiarism detectors that have been evaluated within the sixth international competition on plagiarism detection at PAN 2014. We report on their performances for the two tasks source retrieval and text alignment of external plagiarism detection. For the third year in a row, we invite software submissions instead of run submissions for this task, which allows for cross-ye...

متن کامل

Text Alignment Module in CoReMo 2.1 Plagiarism Detector Notebook for PAN at CLEF 2013

This paper describes the process and basics of the Text Alignment Module into the CoReMo 2.1 Plagiarism Detector, which has won the Plagiarism Detection Text Alignment task in PAN-2013 edition, for both evaluation criteria of efficacy and efficiency, achieving the best detections and the best runtime too. Its high detection efficacy is mainly due to the special features of the contextual n-gram...

متن کامل

Evaluation of Text Reuse Corpora for Text Alignment Task of plagiarism Detection

This paper addresses the text alignment task of 7th International competition on plagiarism detection; PAN 2015. We investigate five submitted corpora and evaluate them based on their characteristics in two ways: manual and automatic evaluation. The results of evaluation show that the most of plagiarism cases in prepared corporahavea rather high quality in term of “rate of obfuscation” alongsid...

متن کامل

Approaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013

In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012